Accounting for Context in Randomized Trials after Assignment

Many preventive trials randomize individuals to intervention condition which is then delivered in a group setting. Other trials randomize higher levels, say organizations, and then use learning collaboratives comprised of multiple organizations to support improved implementation or sustainment. Other trials randomize or expand existing social networks and use key opinion leaders to deliver interventions through these networks. We use the term contextually driven to refer generally to such trials (traditionally referred to as clustering, where groups are formed either pre-randomization or post-randomization — i.e., a cluster-randomized trial), as these groupings or networks provide fixed or time-varying contexts that matter both theoretically and practically in the delivery of interventions. While such contextually driven trials can provide efficient and effective ways to deliver and evaluate prevention programs, they all require analytical procedures that take appropriate account of non-independence, something not always appreciated. Published analyses of many prevention trials have failed to take this into account. We discuss different types of contextually driven designs and then show that even small amounts of non-independence can inflate actual Type I error rates. This inflation leads to rejecting the null hypotheses too often, and erroneously leading us to conclude that there are significant differences between interventions when they do not exist. We describe a procedure to account for non-independence in the important case of a two-arm trial that randomizes units of individuals or organizations in both arms and then provides the active treatment in one arm through groups formed after assignment. We provide sample code in multiple programming languages to guide the analyst, distinguish diverse contextually driven designs, and summarize implications for multiple audiences. Supplementary Information The online version contains supplementary material available at 10.1007/s11121-022-01426-9.


. Technical Details Regarding Individually Randomized Group Treated Designs
The following sections provide technical descriptions that fully specify statements in the text but are not essential for general understanding.
Appendix 1a. Estimation of Variance Due to Grouping After Randomization Occurring in a Single Arm of a Two-Arm Trial Here we discuss some subtleties regarding the estimation of variation in intervention impact across groups formed after assignment. If there are multiple groups in the one arm, and these are comprised of different individuals in each group, then one can treat each group as independent of one another. First define the sample averages in each group g, � , g = 1 , …, G, (G > 1), while the total average is � , and the within group sample standard deviation for group g is 2 . If we assume that individual variation within each group is the same across individuals, the group size N does not vary, then the combined within variance in each group, ,/G, and the between variance B = ∑ can be combined so that W is an unbiased estimate of 2 , the population within variance, and B -W/N is an unbiased estimate of 2 , the between group variance. All this leads to appropriate testing of the mean in one arm with multiple groups against the overall mean in the other arm that has no groups. However, if the group intervention is delivered to everyone in the one arm at the same time, B is undefined because of the denominator G-1.
Even though direct estimation of intervention variation by group cannot be done with only one group, it is possible to estimate the variance of the intervention effect across groups indirectly, provided one is willing to make a strong assumption that the variance for each person is the same in both arms of the trial. Wth this assumption a formula for estimating the ICC = 2 / 2 + 2 is ( � − � � ) � −( −1) � � , here the two variance estimates are the standard formulae for standard errors of the means in the grouped and non-grouped arms, and N is the total number of subjects in the grouped arm. The precision of this estimate, being based on a single degree of freedom is not precise.

Appendix 1b. Specification of Model for IRGT Simulation Studies
Formally, the model generating the data we used for examinngi the consequences of incorrectly specifying the random effects in an IRGT model with a large number of groups and subjects, can be written as follows.
Let index i = 0 , 1 , … , 200 representing group (0 represents control and 1 or larger for intervention), and j = 1 , …, 8,000 for control and j = 1 , … , 40 for i larger than 0 (group treatment). Distributional assumptions are provided below with all error terms being independent of one another.
(1) Yij = α + β Txi + εi + δij j = 1 , … , 8,000 for i = 0; j = 1 , …, 40 for i = 1 , …, 200 Tx0 = 0 , Txi = 1 for i = 1 , …, 200 εi ~ N ( 0 , σB 2 ) for i = 1 , … , 200 The six analyses in Table 1 in the main text are ordered by increasing number of terms in the model. The simplest analysis, displayed in Row 1 of Table 1, is to erroneously ignore grouping effects completely; that is, a model with only fixed and no random effects. This is often the way that IRGT analyses are performed, but we will see that it produces erroneous findings. The second analysis is a mixed-effects model -one with fixed and random terms in the model --that represents the IRGT model correctly (i.e., with Equation 1) with a random effect only for the intervention condition delivered in groups (Row 2). The third analysis involves a mixedeffects model that naïvely accounts for a common random-effect across both intervention conditions without directly accounting for the IRGT structure (Rows 3). The fourth analysis provides for distinct variances by treatment group but ignores grouping entirely (Row 4). The fifth incorporates a common intercept random effect for everyone plus a random-effect for the treatment delivered in a group setting (Row 5). The sixth analysis includes two distinct random effects, one for each intervention condition (Row 6).

Appendix 2. Computer Code and Output for IRGT Modeling of Univariate and Growth Modeling Outcomes in 6 Computer Programming Languages
This online resource provides code for a range of statistical packages for Individually Randomized Single Group Treatment (IRSGT or IRGT) modeling with a single normally distributed outcome and a linear growth model. The underlying model for four test files for univariate and growth models are available from the first author. In all these datasets we label the treatment variable Tx, which takes the value of 1 for active intervention and 0 for control. The variable Group takes the same value for everyone in the control group (i.e., 0) to signify that there is no grouping (i.e., individually randomized individually treated) and for those in the group-delivered intervention condition the value for each unit's group identity (i.e., 1 , …, G).

Univariate Modeling of IRSGT
In the univariate case we generate data from Equation (1) in the main text, copied below.
(2) Yij = β0 + β1 Txi + εi + δij for Txi = 0 for i = 0 controls , Txi=1 for i -1 , … , G j = 1 , … , Ni , i = 0 , 1 , … , G δij ~ N( 0 , σW 2 ) εi ~ N ( 0 , σB 2 ) for i = 1 , … , G εi = 0 for i = 0 Linear Growth Modeling of IRSGT For linear growth modeling, we describe how the data would be organized when we use the "long form," in which each row corresponds to a unique subject by time combination. This format can be used for all statistical packages described below (R, SAS, SPSS, SuperMix, STATA) ; for Mplus we use the "wide format." We introduce two additional variables -Time is a non-negative variable that starts at 0 at baseline, and Subject indexes the individual. We let Y represent the outcome measure; for the growth model the repeated measures of Y on the same subject are distinguished by their Time.
Note that the error term for Group-Slope, eg Slope , is only present for the intervention group, making this an IRSGT. In our simulated data, none of the groups differ at baseline. If there happen to be variations at baseline -which might occur if enrollment varies over time -it would be necessary to include a random effect that is equivalent for those randomized at the same time to the control group. In our dataset we have added variables called Cohort, which distinguishes those in the controls who correspond to each Group, and CohTx, an indicator of each cohort by treatment combination. CohTx is used in the control condition as an artificial, zero-variance component in the control condition.

Coding for Univariate Model
For each of the statistical programs below, we provide a minimal statement to run both the Univariate and the Growth models themselves; if needed a preface identifies minimal code that are needed to set up the model and a postface identifies minimal code that is needed to output the results. Two test datasets, one called Univariate.csv and one called Growth.csv are available for downloading and testing.
The following provide minimal code to specify the fixed and random effects, type of model, and output.
Code to read and organize the data is not provided. All of the programs account for random group effects limited to the active beintervention group as random slopes, so that the controls, having Tx = 0 are ignored at this level. To specify the optimization function (e.g., marginal maximum likelihood or REML), the nonlinear optimization algorithm and convergence criteria ( Table 3 provides a representative list of contextually driven trial designs. For each trial design considered, we describe: where and when the random assignment occurs, how and when groups or networks are formed or changed, and a verbal description of the random effects. Examples are provided for each class. To focus on the core design issues in such designs, this table ignores common design strategies such as blocking (e.g., school is considered a blocking factor when two interventions are assigned to different classes within each school), cross-over designs (e.g., in a stepped-wedge or more general rollout design where each unit is randomized to the timing of when its intervention condition changes (Wyman, Henry, Knoblauch, & Brown, 2015)), and multiple levels of randomization (e.g., a split-plot design where different interventions are randomized at classroom and school levels). Given the popularity of mixed effects modeling (i.e., inclusion of both fixed effects for intervention and covariates and random effects to account for clustering), in Table 3 we provide brief notes on common specifications of random effects. Virtually all these designs can also incorporate heterogeneous variance components across the two arms and, for many, other analysis methods may also be appropriate (e.g., Generalized Estimating Equations, Bayesian methods). Some common design names have been modified to make them more precise (e.g., using Single or Both to distinguish what occurs in one or both arms of the trial), and we also use the word treatment to include prevention.

Appendix 3. Classification and Illustrations of Contextually Driven Intervention Trials
The first (Row 1), a Group Randomized Trial (GRT) (Murray, 1998) involves a head-to-head comparison of two interventions, where groups of individuals already exist (e.g., schools), and these groups are randomized to receive one or the other of these conditions. A recent example is the Wingman Connect Trial (Wyman et al., 2020), which tested a novel intervention focused on preventing suicide and depressive symptoms in new US-Air Force Airmen-in-Training against a stress management active control condition. For both intervention conditions, all components of their respective interventions were delivered in existing Airmen technical training classes (these are the pre-existing groups). The trial involves randomizing 215 training classes of average size 7, to those receiving Wingman Connect or stress management. As the responses of individuals' within the same group (i.e., class) may depend on each other, a correct mixed effect analysis of GRT is one which includes at least one random effect that accounts for group -or a group random effect for each arm if their variances are different. If we were to ignore how the responses of individuals within groups correlates (i.e., classes), we would reject the null hypothesis more often than appropriate.
In the second row of Table 3 is an Individually Randomized Both Groups Treated (IRBGT) trial. As the name indicates, the only difference with a GRT is that here assignment to intervention condition is at the individual level followed by forming groups that receive the same intervention. An example is the comparison of a group-based mindfulness versus group-based present centered therapy trial in the Veterans Administration (Polusny et al., 2015). In this trial, a total of 116 veterans with post-traumatic stress disorder were randomly assigned to one of these interventions; both interventions were delivered in a group format. This type of trial is very similar to GRTs; one difference, however, is that testing for baseline equivalence on individual characteristics in a GRT should include a random effect for grouping, while this baseline equivalence test for an IRBGD may ignore grouping (unless subject enrollment varies over time).
In the third row we present an Individual Randomized Single Group Treated (IRSGT) Trial. With all individuals randomized to two intervention conditions, one condition is delivered in a group setting, and the other is delivered individually. In the literature this design is most often referred to as an IRGT; we include the word "Single" to specify that only one arm is delivered in a group setting and therefore is different from an IRBGD trial described above. An IRSGT is also known as a Partially Clustered Design (H. Li & Hedeker, 2017). Specific guidance on completing a CONSORT statement for this design is available (Boutron, Moher, Altman, Schulz, & Ravaud, 2008). In these designs, eligible individuals are continually randomized to condition, and once sufficient numbers are available to form a group in the relevant single arm, that intervention, as well as the comparison condition -which is administered individually -begins. An example of a trial using this design is the Prevention of Depression Study (PODS), which randomly assigned 316 youth from four locations to a cognitive behavioral group-based prevention program consisting of 8 weekly sessions followed by 6 monthly sessions administered in a group, or individualized usual care (Garber et al., 2009).
Thus, one arm experienced all the intervention through their respective group. For IRSGT trials, traditional intent-to-treat analysis is used even if some individuals drop out before the group or comparison condition begins. Many of these trials include random effects to account for repeated measures (e.g., growth models) and clustering of family members (Brent et al., 2015), but surprisingly few of these trials account for nonindependence of individuals within the same group (Pals et al., 2008).
One ongoing IRSGT study that has transitioned from delivery to a group at one location to the use of virtual groups in order to decrease COVID-19 exposure, is the M-BODY trial (Burnett-Zeigler et al., Submitted for Publication), which compares a group-based mindfulness intervention to reduce stress and depression for African American adults compared to usual care. In this case, the transition from a traditional group to a virtual group setting for the mindfulness arm does not change the planned analysis that would include a random effect for group. However, in analysis, we could investigate whether the face-to-face versus virtual groups have different means and variances.
One variation on the traditional IRSGT trial is the Place Randomized Single Group Treated (PRSGT) trial shown in Row 4 of Table 3. Instead of randomizing individuals and grouping them, we randomize places, sites, or settings to one of two interventions or implementation strategies, one of which combines them into larger groups. This type of design has been used to test two head-to-head implementation strategies in 51 counties, one involving an implementation strategy including each county's service systems and the other a team-based approach that combines 6-8 counties together into a learning collaborative Chamberlain et al., 2008;Chamberlain & Reid, 1998). As the same underlying evidence-based intervention -Multidimensional Treatment Foster Care (MTFC) (Chamberlain, Leve, & DeGarmo, 2007) -was implemented in both arms in this trial, this implementation trial tested whether the learning collaborative improved the quality, delivery, and speed of implementation compared to one that facilitated MTFC's delivery within a single county. Like the IRSGT, an appropriate analysis of this trial required the inclusion of a random effect to account for non-independence due to the learning collaboratives.
Another variation to a traditional IRSGT trial occurs when only part of the intervention is delivered in a group setting, which we call an Individually Randomized Single Group and Individual Treatment (IRSGIT) trial. An example of this is the Familias Unidas, an intervention aimed at preventing the target Hispanic adolescent's substance misuse and HIV sexual risk behavior. Familias Unidas uses both parent groups as well as individual family intervention sessions with a parent and the target youth, In these Familias Unidas trials (Prado et al., 2016;Prado & Pantin, 2011), there have been 8 parent sessions delivered in a group setting that uses participatory learning through dialogue rather than instruction, followed by 4 family sessions, which involve a facilitator supporting an individual parent and adolescent. In these interventions with both group and single family components, it is still appropriate to include a single random effect in that arm to account for the parent component delivered in a group setting.
We note that a new version of Familias Unidas, e-Familias Unidas, is now being tested. This new version is conducted fully virtually. It uses a telenovela format to simulate a group rather than involve groups of parents meeting together. This new version allows parents and youth to view material (Prado et al., 2019) on their own schedule, and it also retains the individualized sessions with the parent and youth, but delivered remotely rather than in the home. Because all the components are based on the individual family, there is no need to account for non-independence with a group random effect.
An Individually Randomized Single Rolling Group Treatment (IRSRGT ) trial (Row 5) differs from IRSGTs in the way that individuals enter and exit groups, which exist only in the active intervention arm.
Instead of having new enrollees wait until enough eligible assigned to the active arm are available to form a new group, they immediately enter an existing group. The curriculum is adapted to address entrances and exits in a rolling fashion. Thus, the composition of the group, and consequently each person's exposure, varies over time. An example of this rolling group design is the BRIGHT trial of a cognitive behavioral intervention to address drug abuse Watkins et al., 2011). In this quasi-experiment involving 299 residential clients, individuals could enter the group at the beginning of each of the four modules (thoughts, activities, people, and substance abuse), each of which lasted 2 weeks with 2 sessions a week. One way to capture non-independence is to account for cross-classified random effects for each session and attribute each pair's covariance to the sum of the random effects that are shared. A simpler analytical approach is to posit a variance-covariance matrix that accounts for the proportion of sessions that are shared in common between each pair of individuals. Here we would model the mean structure to depend on intervention condition and individual level covariates, while the variance-covariance matrix across all subjects in the control condition is a variance times an identity matrix (i.e., forcing independence), and in the rolling group condition the variancecovariance has two parts; one is the same covariance matrix as that for controls, plus a second covariance matrix with a new variance times a correlation matrix where the off-diagonal values for subject i and j is their proportion of shared sessions. Methodologic details on analyzing IRSRGTs using these ideas are less developed than other models in Table 3. An alternative is to use Bayesian methods to account for these multiple membership multiple classification models (Browne, Goldstein, & Rasbash, 2001).
Individually Randomized Network Treated (IRNT) trials (Row 6) represent an intervention in which participant's exposure may vary by one's location in a network. An example is the HOPE Trial that uses trained peer leaders to deliver HIV prevention messages in newly formed social media networks (Young et al., 2015). Both peer leaders and community members who are males having sex with males (MSM) are randomized to these new networks in the HIV intervention arm, where peer leaders deliver general health messages, or serve a similar role in a comparison arm. Peer leaders are recruited, randomly assigned to treatment condition, then randomly assigned within clusters so that peer leaders are different across clusters.
Participants are recruited in waves, such that only after a sufficient number of participants are enrolled and complete their baseline survey are the participants assigned to clusters; then a new wave of recruitment begins (Young et al., 2013). The hypothesized mediator of the target HIV prevention behaviors (e.g., self-testing) is the number of new ties among community members, so one's position in the network can impact exposure to these messages. Inclusion of a random effect to account for the different independent networks is one important component of the statistical analysis. One consideration in the use of social networks as a way to conduct randomized experiments is the likelihood that a platform's functionality changes over time can greatly affect the experience of an eHealth intervention, in which case the analytic approach should address such changes (D. H. Li et al., 2019).
A similar network intervention to an IRNT is a Place Randomized Network Treated Trial (Row 7); here already-formed places are randomized to condition, and the intervention is delivered with potentially different exposure based on one's position in that network. An example is a peer-led Sources of Strength youth suicide prevention program tested in randomly assigned high schools against a standard setting in comparable schools.
Student peer leaders are nominated in schools assigned to Sources of Strength to cover as much of the friendship network as possible in a school, but exposure to messages from these peer leaders is often higher among youth who are in the center rather than periphery of the network because they are more likely to have friendship ties to one or more peer leaders (Pickering et al., 2018).
Spillover Trials (Row 8) deliberately test whether an intervention that targets one individual within a group has additional effects on others within the same group. In contrast to trials where spillover from one intervention arm to another is considered a source of contamination that threatens the trial's integrity, spillover trials are designed to have effects beyond those directly touched by the intervention. As an example, the Philadelphia School Absenteeism Trial randomized parents of youth who were absent from school to either receive or not receive a letter. The effects of this brief intervention were evaluated not only on the target student but also on their siblings, using the family as a group. Thus, each arm of the trial included youth nested in families, and the analyses that evaluated their impact included random effects at the family level and fixed effects on the focal youth and siblings.